Project-Team:ARIC

Inria | Raweb 2016 | Presentation of the Project-Team ARIC | ARIC Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Floating-point arithmetic

Parallel floating-point expansions for extended-precision GPU computations

GPUs are an important hardware development platform for problems where massive parallel computations are needed. Many of these problems require a higher precision than the standard double floating-point (FP) available. One common way of extending the precision is the multiple-component approach, in which real numbers are represented as the unevaluated sum of several standard machine precision FP numbers. This representation is called an FP expansion and it offers the simplicity of using directly available and highly optimized FP operations. In [30] we present new data-parallel algorithms for adding and multiplying FP expansions specially designed for extended precision computations on GPUs. These are generalized algorithms that can manipulate FP expansions of different sizes (from double-double up to a few tens of doubles) and ensure a certain worst case error bound on the results.

Error analysis of the Cornea-Harrison-Tang method

Assuming floating-point arithmetic with a fused multiply-add operation and rounding to nearest, the Cornea-Harrison-Tang method aims to evaluate expressions of the form $a b + c d$ with high relative accuracy. In [12] we provide a rounding error analysis of this method, which unlike previous studies is not restricted to binary floating-point arithmetic but holds for any radix $β$ . We show first that an asymptotically optimal bound on the relative error of this method is $\frac{2 β u + 2 u^{2}}{β - 2 u^{2}} = 2 u + \frac{2}{β} u^{2} + O (u^{3})$ , where $u = \frac{1}{2} β^{1 - p}$ is the unit roundoff in radix $β$ and precision $p$ . Then we show that the possibility of removing the $O (u^{2})$ term from this bound is governed by the radix parity and the tie-breaking strategy used for rounding: if $β$ is odd or rounding is to nearest even, then the simpler bound $2 u$ is obtained, while if $β$ is even and rounding is to nearest away, then there exist floating-point inputs $a, b, c, d$ that lead to a relative error larger than $2 u + \frac{2}{β} u^{2} - 4 u^{3}$ . All these results hold provided underflows and overflows do not occur and under some mild assumptions on $β$ and $p$ satisfied by IEEE 754-2008 formats.

Sharp error bounds for complex floating-point inversion

In [14] we study the accuracy of the classic algorithm for inverting a complex number given by its real and imaginary parts as floating-point numbers. Our analyses are done in binary floating-point arithmetic, with an unbounded exponent range and in precision $p$ ; we also assume that the basic arithmetic operations ( $+$ , $-$ , $\times$ , $/$ ) are rounded to nearest, so that the unit roundoff is $u = 2^{- p}$ . We bound the largest relative error in the computed inverse either in the componentwise or in the normwise sense. We prove the componentwise relative error bound $3 u$ for the complex inversion algorithm (assuming $p \geq 4$ ), and we show that this bound is asymptotically optimal (as $p \to \infty$ ) when $p$ is even, and sharp when using one of the basic IEEE 754 binary formats with an odd precision ( $p = 53, 113$ ). This componentwise bound obviously leads to the same bound $3 u$ for the normwise relative error. However, we prove that the smaller bound $2.707131 u$ holds (assuming $p \geq 24$ ) for the normwise relative error, and we illustrate the sharpness of this bound for the basic IEEE 754 binary formats ( $p = 24, 53, 113$ ) using numerical examples.

On relative errors of floating-point operations: optimal bounds and applications

Rounding error analyses of numerical algorithms are most often carried out via repeated applications of the so-called standard models of floating-point arithmetic. Given a round-to-nearest function $fl$ and barring underflow and overflow, such models bound the relative errors $E_{1} (t) = | t - fl (t) | / | t |$ and $E_{(} t) = | t - fl (t) | / | fl (t) |$ by the unit roundoff $u$ . With S. M. Rump (Hamburg University of Technology), we investigate in [15] the possibility and the usefulness of refining these bounds, both in the case of an arbitrary real $t$ and in the case where $t$ is the exact result of an arithmetic operation on some floating-point numbers. We show that $E_{1} (t)$ and $E_{2} (t)$ are optimally bounded by $u / (1 + u)$ and $u$ , respectively, when $t$ is real or, under mild assumptions on the base and the precision, when $t = x \pm y$ or $t = x y$ with $x, y$ two floating-point numbers. We prove that while this remains true for division in base $β > 2$ , smaller, attainable bounds can be derived for both division in base $β = 2$ and square root. This set of optimal bounds is then applied to the rounding error analysis of various numerical algorithms: in all cases, we obtain significantly shorter proofs of the best-known error bounds for such algorithms, and/or improvements on these bounds themselves.

Tight and rigourous error bounds for basic building blocks of double-word arithmetic

In [63] we analyze several classical basic building blocks of double-word arithmetic (frequently called “double-double arithmetic” in the literature): the addition of a double-word number and a floating-point number, the addition of two double-word numbers, the multiplication of a double-word number by a floating-point number, the multiplication of two double-word numbers, the division of a double-word number by a floating-point number, and the division of two double-word numbers. For multiplication and division we get better relative error bounds than the ones previously published. For addition of two double-word numbers, we show that the previously published bound was wrong, and we provide a relative error bound. We introduce new algorithms for division. We also give examples that illustrate the tightness of our bounds.

A new multiplication algorithm for extended precision using floating-point expansions

Some important computational problems must use a floating-point (FP) precision several times higher than the hardware-implemented available one. These computations critically rely on software libraries for high-precision FP arithmetic. The representation of a high-precision data type crucially influences the corresponding arithmetic algorithms. Recent work showed that algorithms for FP expansions, that is, a representation based on unevaluated sum of standard FP types, benefit from various high-performance support for native FP, such as low latency, high throughput, vectorization, threading, etc. Bailey’s QD library and its corresponding Graphics Processing Unit (GPU) version, GQD, are such examples. Despite using native FP arithmetic as the key operations, QD and GQD algorithms are focused on double-double or quad-double representations and do not generalize efficiently or naturally to a flexible number of components in the FP expansion. In [45] we introduce a new multiplication algorithm for FP expansion with flexible precision, up to the order of tens of FP elements in mind. The main feature consists in the partial products being accumulated in a special designed data structure that has the regularity of a fixed-point representation while allowing the computation to be naturally carried out using native FP types. This allows us to easily avoid unnecessary computation and to present rigorous accuracy analysis transparently. The algorithm, its correctness and accuracy proofs and some performance comparisons with existing libraries are all contributions of this paper.

CAMPARY: Cuda Multiple Precision Arithmetic Library and Applications

Many scientific computing applications demand massive numerical computations on parallel architectures such as Graphics Processing Units (GPUs). Usually, either floating-point single or double precision arithmetic is used. Higher precision is generally not available in hardware, and software extended precision libraries are much slower and rarely supported on GPUs. We develop CAMPARY: a multiple-precision arithmetic library, using the CUDA programming language for the NVidia GPU platform. In our approach, the precision is extended by representing real numbers as the unevaluated sum of several standard machine precision floating-point numbers. We make use of error-free transforms algorithms, which are based only on native precision operations, but keep track of all rounding errors that occur when performing a sequence of additions and multiplications. This offers the simplicity of using hardware highly optimized floating-point operations, while also allowing for rigorously proven rounding error bounds. This also allows for easy implementation of an interval arithmetic. Currently, all basic multiple-precision arithmetic operations are supported. Our target applications are in chaotic dynamical systems or automatic control [34].

Arithmetic algorithms for extended precision using floating-point expansions

Many numerical problems require a higher computing precision than the one offered by standard floating-point (FP) formats. One common way of extending the precision is to represent numbers in a multiple component format. By using the so-called floating-point expansions, real numbers are represented as the unevaluated sum of standard machine precision FP numbers. This representation offers the simplicity of using directly available, hardware implemented and highly optimized, FP operations. It is used by multiple-precision libraries such as Bailey's QD or the analogue Graphics Processing Units (GPU) tuned version, GQD. In this article we briefly revisit algorithms for adding and multiplying FP expansions, then we introduce and prove new algorithms for normalizing, dividing and square rooting of FP expansions. The new method used for computing the reciprocal $a^{- 1}$ and the square root $\sqrt{a}$ of an FP expansion $a$ is based on an adapted Newton-Raphson iteration where the intermediate calculations are done using “truncated” operations (additions, multiplications) involving FP expansions. We give here a thorough error analysis showing that it allows very accurate computations. More precisely, after $q$ iterations, the computed FP expansion $x = x_{0} + \dots + x_{2^{q} - 1}$ satisfies, for the reciprocal algorithm, the relative error bound: $| (x - a^{- 1}) / a^{- 1} | \leq 2^{- 2^{q} (p - 3) - 1}$ and, respectively, for the square root one: $| x - 1 / \sqrt{a} | \leq 2^{- 2^{q} (p - 3) - 1} / \sqrt{a}$ , where $p > 2$ is the precision of the FP representation used ( $p = 24$ for single precision and $p = 53$ for double precision) [16].

Comparison between binary and decimal floating-point numbers

We introduce an algorithm to compare a binary floating-point (FP) number and a decimal FP number, assuming the “binary encoding” of the decimal formats is used, and with a special emphasis on the basic interchange formats specified by the IEEE 754-2008 standard for FP arithmetic. It is a two-step algorithm: a first pass, based on the exponents only, quickly eliminates most cases, then, when the first pass does not suffice, a more accurate second pass is performed. We provide an implementation of several variants of our algorithm, and compare them [8].

Automatic source-to-source error compensation of floating-point programs: code synthesis to optimize accuracy and time

Numerical programs with IEEE 754 floating-point computations may suffer from inaccuracies, since finite precision arithmetic is an approximation of real arithmetic. Solutions that reduce the loss of accuracy are available, such as compensated algorithms or double-double precision floating-point arithmetic. With Ph. Langlois and M. Martel (LIRMM and Université de Perpignan), we show in [21] how to automatically improve the numerical quality of a numerical program with the smallest impact on its performance. We define and implement source code transformations in order to derive automatically compensated programs. We present several experimental results to compare the transformed programs and existing solutions. The transformed programs are as accurate and efficient as the implementations of compensated algorithms when the latter exist. Furthermore, we propose some transformation strategies allowing us to improve partially the accuracy of programs and to tune the impact on execution time. Trade-offs between accuracy and performance are assured by code synthesis. Experimental results show that user-defined trade-offs are achievable in a reasonable amount of time, with the help of the tools we present here.

Correctly rounded arbitrary-precision floating-point summation

We have designed a fast, low-level algorithm to compute the correctly rounded summation of several floating-point numbers in arbitrary precision in radix 2, each number (each input and the output) having its own precision. We have implemented it in GNU MPFR; it will be part of the next MPFR major release (GNU MPFR 4.0). In addition to a pen-and-paper proof, various kinds of tests are provided. Timings show that this new algorithm/implementation is globally much faster and takes less memory than the previous one (from MPFR 3.1.5): the worst-case time and memory complexity was exponential and it is now polynomial. Timings on pseudo-random inputs with various sets of parameters also show that this new implementation is even much faster than the (inaccurate) basic sum implementation in some cases. [36], [65]